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Abstract — Probabilistic  swarm  guidance  enables  autonomous 
agents  to  generate  their  individual  trajectories  independently 
so  that  the  entire  swarm  converges  to  the  desired  distribution 
shape.  In  contrast  with  previous  homogeneous  or  inhomoge¬ 
neous  Markov  chain  based  approaches  [1],  this  paper  presents 
an  optimal  transport  based  approach  which  guarantees  faster 
convergence,  minimizes  a  given  cost  function,  and  reduces  the 
number  of  transitions  for  achieving  the  desired  formation. 
Each  agent  first  estimates  the  current  swarm  distribution  by 
communicating  with  neighboring  agents  and  using  a  consensus 
algorithm  and  then  solves  the  optimal  transport  problem,  which 
is  recast  as  a  linear  program,  to  determine  its  transition  prob¬ 
abilities.  We  discuss  methods  for  handling  motion  constraints 
and  also  demonstrate  the  superior  performance  of  the  proposed 
algorithm  by  numerically  comparing  it  with  existing  Markov 
chain  based  strategies. 

I.  Introduction 

Swarms  of  autonomous  robots  are  widely  studied  be¬ 
cause  they  can  be  controlled  to  collectively  exhibit  useful 
emergent  behavior  [2]-[5].  Similarly,  swarms  of  hundreds  to 
thousands  of  femtosatellites  (100-gram-class  satellites)  are 
being  developed  for  challenging  formation  flying  missions 
like  synthetic  aperture  radar,  interferometry,  etc  [6].  In  this 
paper,  we  introduce  an  optimal  transport  based  approach 
to  develop  distributed  probabilistic  guidance  algorithms  for 
swarms  of  autonomous  robots  or  spacecraft.  Since  determin¬ 
istic  algorithms  of  controlling  individual  agents  scale  poorly, 
we  probabilistically  control  the  swarm  density  distribution  of 
a  large  number  of  index-free  agents. 

Probabilistic  swarm  guidance  is  centered  on  the  idea  of 
controlling  the  swarm  density  distribution  in  a  distributed 
manner  so  that  each  autonomous  agent  or  robot  determines 
its  own  trajectory  in  an  independent  manner  while  the  entire 
swarm  converges  to  the  desired  distribution.  Each  agent 
transitions  using  a  homogeneous  Markov  chain  synthesized 
to  achieve  the  desired  formation  as  its  stationary  distribution 
[7]— [10] .  However,  our  prior  work  [11]  pointed  out  that  the 
homogeneous  Markov  chains  approach  has  the  drawback  of 
driving  agents  to  other  bins  even  after  the  overall  swarm  has 
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Figure  L  (a)  The  initial  probability  mass  function  (pmf)  fi(x)  (in  blue)  is 
transported  to  the  desired  pmf  v(y)  (in  green)  while  minimizing  the  cost 
function  c(x,y)  =  ||a?  —  2/II2-  The  optimum  transference  plan  7 (x,y)  is 
shown  in  gray,  (b)  Another  view  of  the  optimum  transference  plan  7 (x,y) 
from  fi(x)  to  v(y),  where  the  colorbar  represents  the  transported  mass. 

sufficiently  converged  to  the  desired  formation.  This  issue 
is  solved  in  [1]  by  synthesizing  an  inhomogeneous  (time- 
varying)  Markov  chain  so  that  the  agents  settle  down  when 
the  swarm  converges  to  the  desired  formation.  Moreover, 
these  Markovian  approaches  are  robust  to  external  distur¬ 
bances  or  damages  to  the  formation. 

The  main  limitation  of  using  Markov  chains  is  that  multiple 
iterations  are  necessary  before  the  swarm  finally  converges  to 
the  desired  formation.  This  results  in  significant  wastage  of 
control  effort  (e.g.  fuel)  for  achieving  the  desired  formation. 
Moreover,  the  Markov  chain  based  approach  does  not  give  the 
option  of  optimizing  trajectories  for  a  given  cost  function.  On 
the  other  hand,  optimal  transport  is  used  to  find  the  optimum 
transference  plan  7 (x,y)  from  an  initial  distribution  fi{x)  to 
the  desired  distribution  u(y)  with  respect  to  the  cost  function 
c(x,y)  as  shown  in  Fig.  1.  In  this  paper,  we  develop  the 
new  probabilistic  swarm  guidance  algorithm  using  optimal 
transport  (PSG-OT)  to  find  optimal  trajectories  for  each  agent 
in  a  distributed  manner  so  that  the  given  cost  function  is 
minimized.  This  algorithm  also  guarantees  faster  convergence 
and  reduces  the  number  of  transitions  required  to  attain  or 
maintain  the  formation.  This  is  achieved  by  each  agent  first 
estimating  the  current  swarm  distribution  and  then  solving  the 
optimal  transport  problem  (OTP)  to  determine  its  transition 
probabilities.  Since  we  control  the  swarm  density  distribution 
in  a  distributed  manner,  this  algorithm  is  also  robust  to 
external  disturbances  or  damages  to  the  formation. 

Optimal  transport  is  used  for  transportation,  resource  al¬ 
location,  assignment,  etc  [12]— [14].  The  discrete  OTP  is 
solved  using  linear  programming  (LP)  [15],  [16].  In  [17]- 
[19],  LP  is  used,  in  a  centralized  manner,  for  formation  flying 
applications  and  path  planning  of  robots.  In  contrast,  in  this 


Figure  2.  Flowchart  of  PSG-OT  describing  the  key  steps  for  a  single  agent 
in  a  single  time  step. 

paper  we  solve,  in  a  distributed  manner,  the  discrete  OTP  of 
probabilistic  guidance  of  swarms  using  LP. 

The  solution  of  the  probabilistic  guidance  algorithm  es¬ 
tablishes  the  target  bin  for  each  agent.  Collision  avoidance 
algorithms  are  necessary  for  generating  collision-free  guid¬ 
ance  trajectories  from  the  present  location  to  the  target  bin. 
A  recent  survey  paper  [20]  compares  a  number  of  existing 
collision  avoidance  algorithms  based  on  model  predictive 
control,  boundary  following  algorithms,  artificial  potential 
fields,  etc.  The  guidance  trajectories  obtained  from  PSG-OT 
can  be  implemented  using  sequential  convex  programming 
and  model  predictive  control  [21],  [22].  In  this  paper,  we 
present  a  novel  distributed  collision  avoidance  algorithm  for 
multiagent  path  planning  by  modifying  the  coverage  control 
algorithm  using  Lloyd’s  descent  [23].  The  integrated  collision 
avoidance  algorithm,  working  in  conjunction  with  the  bin- 
based  solution  of  the  probabilistic  guidance  algorithm,  results 
in  a  complete  collision-free  solution  for  the  probabilistic 
swarm  guidance  problem. 

A.  Overview  of  PSG-OT  and  Contributions 

The  key  idea  of  the  proposed  PSG-OT  is  to  independently 
determine  the  optimal  trajectory  of  each  agent  using  an 
optimal  transport  based  approach  so  that  the  swarm  density 
distribution  converges  to  the  desired  formation.  The  flowchart 
for  the  algorithm  is  shown  in  Fig.  2  and  its  pseudo  code  is 
given  in  Algorithm  1.  In  step  1,  each  agent  determines  its 
present  location,  as  discussed  in  Section  II-A.  During  step  2, 
each  agent  estimates  the  current  swarm  distribution  by  first 
making  a  localized  guess  and  then  reaching  an  agreement 
across  the  network  using  the  consensus  algorithm  [24].  Step 
2  is  elucidated  in  Section  II-B. 

Step  3  involves  solving  the  OTP  using  LP  to  obtain  the 
transition  probabilities  to  other  bins  for  the  current  time 
step.  The  existence  and  properties  of  the  OTP  solution  are 
presented  in  Section  III.  Random  sampling  of  the  optimal 
transport  solution  generates  the  next  bin  for  the  agent,  as 
shown  in  step  4.  Step  5  involves  checking  if  the  transi¬ 
tion  determined  using  by  the  OTP  solution  satisfies  motion 
constraints.  Methods  for  handling  motion  constraints  are 
presented  in  Section  IV.  Finally,  in  step  6,  the  agent  goes  to 
the  next  bin  while  avoiding  collision,  as  discussed  in  Section 
V. 

The  first  contribution  of  this  paper  is  to  solve  the  discrete 
OTP  in  a  distributed  manner.  A  centralized  node  could  solve 


an  OTP  using  the  exact  location  of  each  agent  and  allocate  the 
final  positions  to  each  agent.  Instead,  in  PSG-OT,  each  agent 
solves  an  equivalent  OTP  using  its  best  estimate  of  the  current 
swarm  distribution  and  then  determines  its  trajectory  in  an 
independent  manner.  Note  that  the  equivalent  OTP  solved  by 
each  agent  has  n2Qll  variables  compared  to  the  centralized 
OTP  with  m2  variables,  where  nce n  is  the  number  of  bins, 
m  is  the  number  of  agents  and  m  nce n. 

The  second  contribution  of  this  paper  is  to  use  optimal 
transport  for  the  problem  of  probabilistic  swarm  guidance. 
We  show  that  this  approach  not  only  minimizes  a  given  cost 
function,  but  also  guarantees  faster  convergence  and  reduces 
the  number  of  transitions  for  achieving  and  maintaining  the 
formation,  while  obeying  motion  constraints. 

The  third  contribution  of  this  paper  is  a  novel  distributed 
collision  avoidance  algorithm,  which  is  obtained  by  modify¬ 
ing  the  Voronoi  partitioning  based  Lloyd’s  descent  algorithm 
[23],  which  is  suitable  for  the  bin-based  architecture  of  this 
paper.  This  is  discussed  in  Section  V.  Hence  this  paper 
presents  a  complete  collision-free  solution  for  probabilis¬ 
tic  swarm  guidance  problem,  by  integrating  the  collision 
avoidance  algorithm  with  the  OTP  solution.  Strategies  for 
implementing  PSG-OT  and  comparison  with  other  existing 
Markov  chain  based  approaches  are  discussed  in  Section  VI. 

B.  Notation 

The  time  index  is  denoted  by  a  right  subscript  and  the 
agent  index  is  denoted  by  a  lower-case  right  superscript.  Let 
N,  M,  M+  and  Mmxn  be  the  sets  of  natural  numbers  (positive 
integers),  real  numbers,  positive  real  numbers  and  m  by  n 
matrices  respectively.  Let  1  =  [1, 1, . . . ,  1]T,  I,  and  0  be 
the  ones  (column)  vector,  the  identity  matrix  of  appropriate 
size,  and  the  empty  set  respectively.  The  symbols  |-|,  [•],  and 
||  •  ||  p  denote  the  absolute  value  or  the  cardinality  of  a  set, 
ceiling  function,  and  the  £p  vector  norm  respectively.  The 
set  Cp(/jl)  denotes  the  set  of  all  functions  /(sc)  :  Rnx  -A  M 
with  the  bounded  integral  \f(x)\pd/jJ(x))1^P,  where  /i  is 
a  measure.  For  /,  g  :  N  -A  M,  we  say  that  /  G  0(g)  if  there 
exists  no  G  N  and  k  G  M+  such  that  |/(n)|  <  k\g(n)\  for  all 
n  >  no. 

II.  Problem  Formulation  and  Preliminaries 

In  this  section,  we  first  present  the  problem  statement  and 
then  present  a  distributed  algorithm  for  estimating  the  current 
swarm  distribution. 

A.  Problem  Statement 

The  nx -dimensional  compact  physical  space  over  which 
the  swarm  is  distributed  is  denoted  by  1Z  C  Mn£C  and 
partitioned  into  nceii  disjoint  convex  bins  represented  by 
R[i],  i  =  1 , . . . ,  ncen .  Let  m  G  N  agents  belong  to  the 
swarm,  where  m  nce n.  We  assume  that  each  agent  can 
determine  its  actual  location  in  1Z,  which  is  denoted  by 
pk  G  Rnx.  The  row  vector  r°k  represents  the  actual  bin  in 
which  the  jth  agent  is  present  at  the  kth  time  instant,  i.e.,  if 
the  jth  agent  is  in  bin  R[i]  (i.e.,  pk  G  R[i])  then  rk[i\  =  1. 
The  current  swarm  distribution  (J~k\  which  is  a  row  vector, 


is  given  by  the  ensemble  mean  of  actual  agent  positions, 
i.e.,  Tk  :=  E  YYjL  1  rk •  Each  agent  in  a  particular  bin  can 
only  transition  to  some  bins,  due  the  dynamics  or  physical 
constraints,  which  are  captured  by  the  motion  constraints 
matrix  Ak.  Let  us  now  define  the  desired  formation  and  the 
cost  function. 

Definition  1.  ( Desired  Formation  7r)  The  desired  formation 
shape  is  represented  by  a  probability  (row)  vector  7 r  G  Mnce11 
over  the  bins  in  7 Z,  i.e.,  7rl  =  1.  Note  that  7r  can  be  any 
arbitrary  probability  vector,  but  it  is  the  same  for  all  agents 
within  the  swarm.  In  the  presence  of  motion  constraints,  7 r 
needs  to  satisfy  Assumption  1  on  page  5.  For  example,  7r  is 
the  pmf  v(y)  in  Fig.  1.  □ 

Definition  2.  (Cost  Function  ck)  The  (possibly  time-varying) 
cost  of  transitioning  between  bins  is  represented  by  the  matrix 
Ck  G  Mncenxnceii?  where  each  element  ck  [i,  £]  is  the  cost  of 
transporting  an  agent  from  bin  R[i]  to  bin  R[£\.  For  the 
existence  of  optimal  transference  plans,  c k  should  satisfy 
Theorem  1  on  page  3.  For  example,  c(x,y)  =  \\x  —  y\\ 2 
in  Fig.  1.  □ 

The  objectives  of  probabilistic  swarm  guidance  using  op¬ 
timal  transport  (PSG-OT)  running  onboard  each  agent  are  as 
follows: 

1)  Determine  the  optimal  trajectory  of  each  agent  with 
respect  to  given  cost  function  (c^),  which  obeys  motion 
constraints  ( AJk ),  so  that  the  overall  swarm  converges 
to  a  desired  formation  (7r). 

2)  Maintain  the  swarm  distribution  and  automatically  de¬ 
tect  and  repair  damages  to  the  formation. 

B.  Estimation  of  Current  Swarm  Distribution  using  Consen¬ 
sus 

In  our  prior  work  [1],  the  consensus  algorithm  [24]-[27] 
is  used  to  estimate  the  current  swarm  distribution  (Pk)  in 
a  distributed  manner.  We  follow  the  same  approach  in  this 
paper.  Let  the  row  vector  jFk  v  represent  the  jth  agent’s 
estimate  of  the  current  swarm  distribution  during  the  oth 
consensus  loop  at  the  kth  time  instant.  Each  agent  first  locally 
estimates  the  swarm  distribution,  i.e.,  Pk  x  =  r°k.  Then  the 
agents  recursively  combine  their  local  estimates  with  their 
neighboring  agents  using  the  Linear  Opinion  Pool  (LinOP) 
of  probability  measures  [28],  [29]: 

H,U+ 1  =  E  akH,^  W  e  {!)•••> m}, Vv  e  N,  (1) 

where  Jk  is  the  set  of  inclusive  neighbors  of  the  jth  agent 
and  °^k  =  E  The  matrix  P k,  with  entries  Pk[l,j]  = 

,  conforms  with  the  time-varying  communication  network 
topology  of  the  multi-agent  system  Qk.  Since  m  ncen 
and  multiple  agents  are  within  the  same  bin,  it  is  guaranteed 
almost  surely  using  Erdos-Renyi  random  graphs  or  random 
nearest-neighbor  graphs  that  Qk  is  strongly  connected  (SC) 
[30]-[32].  Moreover,  distributed  algorithms  exist  for  the 
agents  to  generate  SC  balanced  graphs  [33]-[35]. 

If  Qk  is  SC  and  balanced,  then  each  agents  local  es¬ 
timate  Pk  v  converges  globally  exponentially  converges  to 


the  global  estimate  of  the  current  swarm  distribution  Tk  = 
YZ'iLi  m^k  1  pomtwise  with  a  rate  faster  or  equal  to  the 
second  largest  singular  value  of  Pk,  i.e.,  crm_i(P/c)  [1].  For 
some  £cons  >  0,  if  the  number  of  consensus  loops  within 

,  .  ^  rin(£cons/(2v/m)) 

each  consensus  stage  nioop  >  1  — - - 

agent  has  a  good  estimate  of 
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Tk  [i]  | .  In  this  paper,  we  assume  £COns  <  ^  and  nioop  is 
computed  accordingly. 


III.  Optimal  Transport  Problem 

The  key  concept  of  PSG-OT  is  that  each  agent  can 
independently  determine  its  optimal  trajectory  by  solving  an 
OTP  using  LP  so  that  the  swarm  converges  to  the  desired 
formation.  In  the  absence  of  motion  constraints,  the  objective 
of  this  OTP  is  to  transport  the  swarm  from  its  estimated 
current  distribution  (jFl  „  )  to  the  desired  formation  hv) 

I*  ?  nioop  x 

while  minimizing  the  cost  function  (c&). 

Let  n[i\  G  Rnx  represent  the  centroid  of  the  convex 
bin  R[i\.  The  Dirac  measure  on  x  G  Mrix  is  the  measure 
5x(k,[i\)  =  1  if  x  =  n[i\  and  0  otherwise.  Then  the 
measure  induced  by  the  jth  agent’s  estimated  current  swarm 
distribution  (Tk  )  on  7Z  is  given  by: 

^cell 

=  E nioop  •  (2) 

i= 1 

Similarly,  for  y  G  Wlx,  the  measure  induced  by  the  desired 
formation  (7r)  on  1Z  is  given  by: 

^cell 

vk{y)  =  E7rH<Ji/(K[i])  •  (3) 

i=  1 

The  OTP  is  represented  as  the  Monge-Kantorovich  mini¬ 
mization  problem  [13]: 

7fe(*,y)=arg  inf  /  ck(x,y)dw(x,y) ,  (4) 

Jnxn 

where  the  infimum  runs  over  all  joint  probability  measures 
w(x,y)  on  1Z  x  1Z  with  marginals  yk(x)  and  Vk(y)-  The 
joint  measures  or  couplings  that  achieve  the  optimum  are 
called  optimum  transference  plans  (7 3k(x,y)).  The  follow¬ 
ing  theorem  gives  conditions  for  the  existence  of  optimum 
transference  plans. 

Theorem  1.  [13,  Theorem  4.1,  pp.  43]  ( Existence  of  an 

optimal  coupling )  Let  a(x)  G  Li(yk),  b(y)  G  Li(yif)  be 
two  upper  semi- continuous  functions  and  Ck(x ,  y)  be  a  lower 
semi-continuous  cost  function,  such  that  Ck(x,y)  >  a(x)  + 
b(y),Vx,y.  Then  there  exists  an  optimal  coupling  7 3k(x,y) 
which  minimizes  the  total  cost  fnxnCk(x,y)dw(x,y) 
among  all  possible  couplings  w (x,y)  on  1Z  x  1Z  with 
marginals  p3k(x)  and  vk(y). 

In  this  paper,  we  use  the  1 2  distance  between  the  centroids 
of  bins  as  the  cost  function,  because  it  satisfies  Theorem  1 : 
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ck[i,£\  =  ||  K,[i\  -  k[£]\\ 2,Vi,£  G  {1, . . . ,  nceii},  Vfc  G  N. 

(5) 

Moreover,  the  optimum  transference  plan  y]k  has  the  follow¬ 
ing  important  properties  [13]: 

•  Cyclical  monotonicity:  It  is  not  possible  to  improve  the 
incurred  cost  by  rerouting  masses  along  some  cycle. 

•  Duality:  The  problem  of  minimizing  cost  of  transporta¬ 
tion  has  a  dual  Kantorovich  problem  where  the  objective 
is  to  maximize  profits. 

•  Restriction:  Given  an  optimum  transference  plan,  any 
induced  sub-plan  (transferring  part  of  the  initial  mass  to 
part  of  the  final  mass)  is  optimal  too. 

•  Shortening  principle:  Two  distinct  optimal  transference 
trajectories  do  not  intersect,  except  at  endpoints. 

The  discrete  OTP  can  also  be  written  in  the  following  form, 
where  jk  G  MncellXnce11  is  the  optimum  transference  plan: 

^cell  ^cell 

(6) 

i= 1  1=1 

™cell 

subject  to  ^7 =  ^fe,„loopW>  e  {l,...,ncen} 

i=l 

^cell 

5^7 £M]=7t[4  it  G  {l,...,nceii} 

i=  1 

7&M]  ^  °>  G  {1, . . . ,  rzceii} 


This  problem  can  be  solved  using  LP  in  time  0(nlnq  + 
eo(y/nv  log  nv)),  where  Uv  an(i  Uq  are  the  number  of  variables 
and  linear  inequalities  respectively  [17]. 

The  optimal  transference  plan  7^,  which  is  the  solution  of 
the  discrete  OTP,  can  also  be  interpreted  in  the  Markovian 
sense.  For  a  non-empty  bin  R[i\,  the  probability  of  the  jth 
agent  transitioning  from  bin  R[i\  to  bin  R[q\  during  the  kth 
time  instant  is  given  by: 


P(r-i+1M  =  lK[i]  =  l) 


7 jM 


The  Markov  matrix,  representing  the  transition  probabilities 
of  the  jth  agent  at  the  kth  time  instant  given  by  (7),  does  not 
have  7r  as  its  stationary  distribution.  Hence  it  is  dissimilar 
from  the  existing  approaches  [1],  [10]  utilizing  the  stationary 
distribution  of  the  Markov  chain  to  achieve  the  desired 
formation.  But  it  is  due  to  this  Markovian  interpretation  that 
PSG-OT  is  also  robust  to  external  disturbances  or  damages 
to  the  formation. 


IV.  Motion  Constraints 

In  this  section,  additional  constraints  on  the  motion  of 
agents  are  considered  and  the  cost  function  in  PSG-OT  is 
modified  to  handle  them.  The  agents  in  a  particular  bin  can 
only  transition  to  some  bins  but  cannot  transition  to  other 
bins  because  of  the  dynamics  or  physical  constraints.  These 
(possibly  time- varying)  motion  constraints  are  specified  by 
the  matrix  A°k  as  follows  [1]: 


Figure  3.  (a)  Let  the  red  region  denote  the  set  II.  The  motion  constraints 
are  such  that  the  agents  can  only  transition  to  the  immediate  neighboring 
bins.  The  two  corner  regions  form  the  two  subsets  of  II,  such  that  any  agent 
cannot  transition  from  one  subset  to  another  subset  without  exiting  the  set 
II.  (b)  The  letters  UIUC  (in  red)  is  the  desired  formation  7V  in  Section  VI. 
Although  7 Z  is  partitioned  into  900  bins,  there  are  about  448  non-empty 
bins. 


4M] 


1  if  the  jth agent  can  transition  to  R{£\ 

0  if  the  jth agent  cannot  transition  to  R[£\ 


where  rjk[i\  =  1,  W  G  {1, . . . ,  nceii}  .  (8) 


It  is  assumed  that  the  graph  conforming  to  the  A3k  matrix  is 
strongly  connected  and  the  A°k  matrix  satisfies  Assumption 
1  on  page  5. 

In  this  paper,  we  capture  the  effect  of  these  motion 
constraints  by  modifying  the  cost  function  (5).  Let  us  choose 
a  positive  constant  cmax  such  that  cmax  >>  Ck[i,  £],  Vi,  £  G 
{1, . . . ,  ncen},  Vfc  G  N.  The  modified  cost  function  that 
captures  the  effect  of  the  motion  constraints  is  given  by: 


ck[i,t] 


Cmax  if  =  0’ 


where  r{  [i]  =  1,  it  G  {1, . . . ,  nceu},  ik  GN .  (9) 


In  the  presence  of  motion  constraints,  this  modified  cost 
function  (9)  is  used  in  the  discrete  OTP  (6).  Note  that  the 
solution  of  the  discrete  OTP  with  the  original  cost  function 
Ck  (5)  is  a  feasible  solution  for  the  discrete  OTP  with  the 
modified  cost  function  Ck  (9).  Hence,  the  existence  of  an 
optimal  solution  for  the  discrete  OTP  (6)  with  the  modified 
cost  function  Ck  (9)  is  guaranteed  because  the  existence  of  at 
least  one  feasible  solution  guarantees  existence  of  an  optimal 
solution  [15]. 

Let  us  define  n  as  the  set  of  all  bins  that  have  nonzero 
probabilities  in  7V.  For  a  bin  R[i\,  let  us  define  Ak(R[i])  as 
the  set  of  all  bins  that  the  jth  agent  can  transition  to  at  the  kth 
time  instant.  If  the  jth  agent  is  actually  located  in  bin  R[i\ 
and  we  observe  that  Ak(R[i\)  fl  n  =  0  for  all  k  G  N,  then 
the  jth  agent  is  trapped  in  the  bin  R[i\  forever.  Similar  to 
[1],  we  avoid  the  trapping  situation  by  enforcing  a  secondary 
condition  that  the  jth  agent  in  bin  R[i\  with  Ak(R[i])nlL  =  0 
will  transition  to  the  bin  ^3k(R[i])  during  the  kth  time  step, 
where  the  bin  ^f3k(R[i])  G  Ak(R[i])  is  closest  to  the  bins  in 
n  or  does  not  have  this  trapping  problem.  In  all  other  cases, 
if  the  solution  to  the  OTP  orders  the  agent  to  transition  to 
a  bin  that  is  not  allowed  by  the  motion  constraints,  then  the 
agent  remains  in  its  current  bin. 


In  some  situations,  II  can  be  decomposed  into  subsets,  such 
that  any  agent  cannot  transition  from  one  subset  to  another 
subset  without  exiting  the  set  II  due  to  motion  constraints 
(for  example  see  Fig.  3(a)).  In  PSG-OT,  the  agents  will  only 
transition  within  the  bins  in  II  once  it  has  entered  any  bin 
in  II.  Hence  the  agents  in  one  such  subset  of  n  will  never 
transition  to  the  other  subsets  of  n  because  it  has  to  travel 
through  bins  which  are  not  in  n.  In  order  to  avoid  such 
situations,  we  need  the  following  assumption  on  n  and  A°k. 

Assumption  1.  [1]  Each  agent  can  transition  from  any  bin 
in  n  to  any  other  bin  in  n  without  exiting  the  set  n,  while 
satisfying  the  motion  constraints.  □ 

The  pseudo  code  of  PSG-OT,  with  motion  constraints  and 
collision  avoidance,  is  shown  in  Algorithm  1. 


Algorithm  1  Probabilistic  swarm  guidance  algorithm  using 
optimal  transport  (PSG-OT) 

1: 

(one  cycle  of  jth  agent  during  kth  time  instant) 

2: 

Agent  determines  its  present  location,  e.g.,  pk,  R[i] 

3: 

Set  Tlloop?  Ujj,  ,  and  Tmax 

>1 

4: 

for  is  =  1  to  nioop 

Estimate 

5: 

if  v  —  1  then  Set  PJk  l  =  rjk  end  if 

current 

6: 

Exchange  pmfs  Pfc5i,,W  G  Jl 

swarm 

7: 

Compute  the  new  pmf  J 7k  u  using  (1) 

distribution 

8: 

end  for 

9: 

Compute  the  cost  function  Ck 

}  Eq.  (9) 

10:  Solve  discrete  OTP  to  get  jk 

}  Eq.  (6) 

11 

:  Generate  a  random  number 

2  €  unif  [0;£S7iM]] 

1 

Random 

12:  Choose  bin  R[q\  such  that 

sampling 

EE!7iM<2<ELi7iM 

J 

13 

:  if  A3k[i,  q]  —  1  then  Go  to  bin  R[q] 

1 

Check 

14 

:  else  if  Ajk(R[i\)  fl  II  =  0 

motion 

15 

:  then  Go  to  bin  ^f3k(R[i]) 

constraints 

16 

:  else  Remain  in  present  bin  R[i\  end  if 

J 

17 

.  for  T  —  1  to  Tmax 

18 

:  Exchange  locations  pk  ,  G 

19 

Compute  \'l  T,  VlT ,  and  C(V^T) 

Generate 

20 

if  Kr  =  0’  theI*  PJk,r  + 1  =  PL,t 

collision- 

> 

21 

:  else  if  R[q]  =  R[i\, 

free 

22 

then  piT+1  =  C(Vlr) 

trajectory 

23 

:  else  set  pk  T+1  from  (13)  end  if 

24 

end  for 

V.  Collision-free  Trajectory  Generation 

In  PSG-OT,  collision  avoidance  algorithms  are  necessary 
for  obtaining  the  guidance  trajectory  of  each  agent  from  its 
current  location  to  the  target  bin.  In  this  section,  we  present  a 
new  distributed  collision  avoidance  algorithm,  by  modifying 
the  coverage  control  algorithm,  which  is  suitable  for  the  bin- 
based  architecture  of  this  paper. 

The  coverage  control  algorithm,  which  uses  the  Voronoi 
partitioning  based  Lloyd’s  descent  algorithm  [23],  is  designed 
for  optimally  distributing  multiple  agents  over  a  given  region. 
This  distribution  is  optimal  with  respect  to  a  ^2-norm  based 


cost  function.  An  interesting  property  of  these  Voronoi  par¬ 
titions  is  that,  if  the  given  region  is  a  convex  polytope  in 
an  nv -dimensional  Euclidean  space,  then  the  boundary  of 
each  Voronoi  partition  is  the  union  of  ( nv  —  1) -dimensional 
convex  polytopes  and  each  Voronoi  partition  is  itself  a  nv- 
dimensional  convex  poly  tope. 

In  PSG-OT,  at  every  time  step,  the  agents  in  a  particular 
bin  either  remain  in  the  same  bin  or  transition  to  target  bins. 
Let  us  call  the  agents  that  remain  in  the  same  bin  as  stationary 
agents  and  those  that  transition  to  target  bins  as  transient 
agents.  Let  each  agent  construct  a  Voronoi  partition  in  its 
bin.  The  key  idea  for  collision-free  trajectory  generation  is 
to  direct  each  stationary  agent  to  the  centroid  of  its  Voronoi 
partition.  Each  transient  agent  is  directed  to  the  position  in 
its  Voronoi  partition  that  has  the  minimum  distance  from  its 
target  bin. 

Let  rco i  denote  the  minimum  collision  avoidance  distance 
to  be  maintained  between  agents.  Let  pk  r  represent  the 
actual  location  of  the  jth  agent  during  the  rth  collision 
avoidance  loop  at  the  kth  time  instant.  At  the  beginning 
of  the  collision  avoidance  loop  at  the  kth  time  instant,  the 
actual  location  of  the  jth  agent  is  the  final  location  of 
the  jth  agent  at  the  end  of  the  previous  time  instant,  i.e., 
pk  i  =  p\_xT  where  rmax  is  the  number  of  collision 
avoidance  loops  in  each  time  instant.  The  distributed  collision 
avoidance  algorithm  for  agents  in  bin  R[i\  during  the  rth 
collision  avoidance  loop  at  the  kth  time  instant  is  given  as 
follows: 

1)  Let  agents  {ji,j2  •  •  • ,  jn}  be  the  set  of  all  agents 
present  in  the  bin  R[i\,  i.e.,  each  pJkr,pJkr, . . . ,  p°kr  G 

R[i}.  Let  PktT(R[i])  =  {pi]T,PktT,  ■  ■  ■  ,Pk,T}  represent 
the  actual  locations  of  all  the  agents  in  R[i\. 

2)  Construct  the  Voronoi  partition  V(Pk,r(R[i]))  = 

{Vk\,  Vk2r , . . . ,  VknT}  over  the  bin  R[i\  corresponding 
to  PkiT(R[i]).  For  each  agent  j  G  {ji,  j2  •  •  • ,  jn},  its 
Voronoi  set  (Vkr)  is  given  by: 

Vfc,Tfe  R[i]  :  \\q  ~  PktTh  <  \\q  ~  pi,rh, 

w  e  {ji  ,j2...,jn}  and  j  ±  £}  .  (10) 

3)  Let  dVkr  denote  the  boundary  of  the  Voronoi 
set  Vkr.  Construct  a  modified  Voronoi  partition 
V(Pk,r(Rm  =  {V£r,vfr,...,v&}  such  that  for 
all  j  e  {jl,j2---,jnh 

Kr  =  ^  Kr  ■  ll«  -  9Vlrh  >  ^col}  •  dD 

4)  Compute  the  centroid  C(Vk  r)  of  all  the  sets  in  the 
modified  Voronoi  partition  V(PfcjT(i2[i])),  which  is 
defined  as: 


5)  Depending  on  the  type  of  the  agent,  set  the  new 
location  of  the  agent  as  follows. 

a>  If  K,r  =  0’  then  We  Set  Pfc.r+1  =  P*,t- 


T  =  1 


T  =  2 


T  =  3 


VI.  Numerical  Simulation 


Ligure  4.  Time  evolution  of  four  agents  in  a  bin,  following  the  collision 
avoidance  algorithm,  where  the  transient  agent  (in  red)  goes  from  left-bottom 
to  the  right-top  of  the  bin  and  the  remaining  stationary  agents  (in  blue)  give 
way  to  the  transient  agent.  The  Voronoi  sets  (Vk  ,  Vk  )  of  all  the  agents 
are  also  shown. 

b)  If  the  jth  agent  is  a  stationary  agent,  then  set  the 
new  location  of  the  jth  agent  to  the  centroid  of  its 
modified  Voronoi  set  V^T,  i.e.,  pj>r+1  =  C(V^r). 

c)  If  the  jth  agent  is  a  transient  agent,  then  set  the 
new  location  of  the  jth  agent  as: 

Pk,r+ i=arS  min  (13) 

where  Kk  elZ  represents  the  centroid  of  its  target 
bin. 

6)  If  the  transient  agent  is  close  to  the  boundary  of  the 
bin  R[i\,  then  it  also  constructs  Voronoi  sets  in  the 
neighboring  contiguous  bins.  This  process  allows  it  to 
cross  the  bin  boundary  and  transition  from  bin  R[i] 
to  a  contiguous  bin  which  is  closer  to  its  target  bin. 
When  the  transient  agent  finally  reaches  its  target  bin, 
it  becomes  a  stationary  agent  in  that  target  bin. 

These  steps  are  illustrated  in  Fig.  4.  The  advantage  of 
this  novel  distributed  collision  avoidance  algorithm,  using 
Voronoi  partitions,  is  that  it  guarantees  collision  avoidance. 
Since  each  bin  R[i]  is  convex,  the  Voronoi  set  (Vk  T)  and 
the  modified  set  (Vk  T)  are  also  convex.  As  each  agent  only 
moves  within  its  Voronoi  set,  the  agents  are  guaranteed  to 
avoid  collisions  due  to  the  buffer  region  of  rcoi  along  the 
boundary  of  each  Voronoi  set.  Moreover,  the  process  of 
constructing  Voronoi  sets  is  achieved  in  a  distributed  manner, 
hence  we  have  a  simple,  efficient,  distributed  algorithm  for 
collision  avoidance  using  Voronoi  partitions. 

Although  the  process  of  generating  Voronoi  partitions  is 
computationally  expensive,  it  is  less  than  the  computational 
load  for  solving  the  discrete  OTP  using  LP.  Moreover,  the 
communication  load  is  the  same  as  the  consensus  algorithm. 
Hence  this  algorithm  can  be  executed  using  the  existing 
computational  and  communication  capabilities  of  agents  per¬ 
forming  PSG-OT. 


In  this  section,  we  demonstrate  distributed  swarm  guidance 
using  PSG-OT  and  compare  it  with  other  existing  probabilis¬ 
tic  guidance  techniques.  Let  us  first  introduce  these  Markov 
chain  based  probabilistic  swarm  guidance  algorithms. 


A.  Markov  Chain  based  Swarm  Guidance  Algorithms 

In  probabilistic  swarm  guidance  algorithm  using  inhomo¬ 
geneous  Markov  chains  (PSG-IMC),  each  agent  chooses  the 
tuning  parameter  (££)  based  on  the  Hellinger  distance  (HD) 
between  the  estimated  current  swarm  distribution  (Pi  „  ) 

ri ,  /  Hoop 

and  the  desired  formation  (tt),  using  the  following  equation 

[1]: 


ncell  _  2 

ek  =  DH  :=  J2  (VW\  -  JKZM)  ■ 


(14) 


The  HD  is  a  symmetric  measure  of  the  difference  between 
two  probability  distributions  and  it  is  upper  bounded  by  1 
[36].  Each  agent  then  transitions  according  to  the  following 
time- varying  Markov  matrix  Mk,  with  7r  as  its  stationary 
distribution  (i.e.,  7vM°k  =  7v)  [1]: 

Mk  =  +  1  “  £fcdiag(aJfc),  (15) 

where  7V  a?k  ^  0  and  sup*.  ^Hq^Hoo  <  1.  The  probability 
that  the  jth  agent  in  bin  R[i\  at  the  kth  time  instant  will 
transition  to  bin  R[£]  at  the  (k  +  l)th  time  instant  is  given 
by  Mk[i,£\.  To  ensure  that  the  agents  settle  down  after  the 
desired  formation  is  achieved,  we  see  that  M°k  -A  I  if  £k  -A  0 
[1]. 

Meanwhile,  probabilistic  guidance  algorithm  (PGA)  in¬ 
volves  using  a  homogeneous  Markov  matrix  M,  with  7r  as 
its  stationary  distribution,  for  all  agents  over  all  time  steps 
[10].  We  compare  PSG-OT  with  PGA  and  PSG-IMC  in  the 
next  section. 


B.  Simulation  Results 

We  now  present  the  setup  of  this  simulation  example.  The 
swarm  containing  m  =  5000  agents  is  guided  to  form  the 
desired  formation  7 r  associated  with  the  letters  UIUC  as 
shown  in  Fig.  3(b).  As  shown  in  Fig.  6,  the  simulation  starts 
at  k  =  1  time  instant  with  the  agents  uniformly  distributed 
across  P  C  M2,  which  is  partitioned  into  30  x  30  bins.  During 
the  consensus  stage,  each  agent  is  allowed  to  communicate 
with  those  agents  which  are  at  most  10  steps  away.  Let 
K,[i\  =  (x[i\,  y[i ])  denote  the  location  of  the  bin  R[i\  in  the 
30  x  30  grid  and  the  communication  network  topology  P& 
is  setup  using  Metropolis  weights.  Moreover,  each  agent  is 
allowed  to  transition  to  only  those  bins  which  are  at  most 
5  steps  away.  An  agent  undergoes  a  transition  if  it  jumps 
from  bin  R[i\  to  bin  R[£\,  £  /  i  during  a  given  time  step. 
It  is  desired  that  the  swarm  guidance  algorithm  minimizes 
the  cost  of  transitioning  during  each  time  step.  The  modified 
cost  function  is  given  by  (9). 

Monte  Carlo  simulations  were  performed  to  compare  the 
performance  of  PSG-OT,  illustrated  in  Algorithm  1,  with 
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Figure  5.  Monte  Carlo  simulations  were  performed  to  compare  the  perfor¬ 
mance  of  PSG-OT  with  PGA  and  PSG-IMC.  The  results  from  50  simulation 
runs  along  with  their  3  —  a  error-bars  are  presented  for:  (a)  HD  between 
the  current  swarm  density  distribution  and  the  desired  formation,  (b)  the 
number  of  transitions  per  time  step,  (c)  the  transition  cost  incurred  per  time 
step,  and  (d)  the  total  cost  incurred  till  the  present  time  step.  The  spike  or 
discontinuity  after  the  20th  time  step  is  due  to  the  external  damage  to  the 
middle  section  of  the  swarm.  Note  that  PSG-OT  converges  much  faster  and 
performs  much  better  than  PSG-IMC  and  PGA  in  all  categories. 


Figure  6.  Histogram  plots  of  the  swarm  distribution  at  different  time  instants 
for  5000  agents  executing  PSG-OT.  The  colorbar  represents  the  pmf  of 
the  swarm  distribution.  Starting  form  an  uniform  distribution,  the  swarm 
converges  to  the  desired  formation  within  a  couple  of  time  steps.  The 
middle  section  of  the  swarm  is  externally  damaged  and  1248  agents  are 
removed  after  the  250th  time  step,  but  the  swarm  autonomously  recovers 
within  another  couple  of  time  steps. 


Moreover,  the  HD  between  the  swarm  density  distribution  in 
steady  state  and  the  desired  formation  only  reduces  to  ~  0.12 
for  PSG-IMC  and  PGA  cases  as  compared  to  ^  0.02  in  the 
case  of  PSG-OT.  After  the  250th  time  step,  the  swarm  is 
externally  damaged  by  eliminating  approximately  1000  ±100 
agents  from  the  middle  section  of  the  formation.  This  is  seen 
by  comparing  the  images  for  the  250th  and  251st  time  step 
in  Fig.  6.  Note  that  the  swarm  quickly  recovers  from  this 
damage  and  the  remaining  agents  attain  the  desired  stationary 
distribution  within  another  couple  of  time  steps  in  the  case 
of  PSG-OT,  while  it  takes  approximately  200  time  steps  for 
PSG-IMC  and  PGA  cases. 

As  shown  in  Fig.  5(b),  the  average  number  of  transitions 
is  per  agent  in  500  time  steps  is  1.34  ±  0.08  in  the  PSG- 
OT  case  compared  to  6.2  ±0.1  in  the  PSG-IMC  case  and 
33.7  ±0.1  in  the  case  of  PGA.  Moreover,  it  is  evident  from 
Fig.  5(d)  that  the  total  cost  incurred  by  PSG-OT  ((1.53  ± 
0.18)  x  104)  after  500  time  steps  is  significantly  lesser  than 
that  incurred  by  PSG-IMC  ((9.15  ±  0.44)  x  104)  and  PGA 
((4.80  ±0.06)  x  105).  It  is  also  evident  from  the  slope  of  the 
lines  in  Fig.  5(d)  that  the  steady  state  cost  of  maintaining  the 
formation  is  significantly  higher  for  the  PSG-IMC  and  PGA 
cases  as  compared  to  PSG-OT.  Hence,  we  conclude  that  the 
performance  of  PSG-OT  is  the  best  in  all  these  categories 
when  compared  to  PSG-IMC  and  PGA. 

Note  that  both  PSG-OT  and  PSG-IMC  use  the  consensus 
algorithm  for  estimating  the  current  swarm  distribution,  but 
solving  the  discrete  OTP  using  LP  is  significantly  more 
computationally  expensive  than  evaluating  the  Markov  chain. 
Hence,  solving  the  discrete  OTP  in  a  distributed  manner 
using  PSG-OT  has  the  same  communication  load  as  the 
Markov  chain  based  approaches,  but  has  significantly  more 
computational  load.  It  is  also  evident  from  the  cumulative 
results  of  the  50  simulation  runs  in  Fig.  5  that  PSG-OT 
works  reliably  well  and  achieves  the  desired  objectives  in 
all  simulations  runs. 

During  each  time  step,  each  agent  executes  the  collision 
avoidance  algorithm  to  reach  its  target  bin.  For  visualization 
purposes  a  simpler  problem,  where  100  agents  reach  the 
desired  formation  of  the  letter  “I”,  is  shown  in  Fig.  7. 
Thus  the  collision  avoidance  algorithm  helps  the  agents 
successfully  implement  PSG-OT. 

VII.  Conclusions 


PGA  and  PSG-IMC.  The  cumulative  results  from  50  simu¬ 
lation  runs  are  shown  in  Fig.  5.  The  histogram  plots  of  the 
swarm  distribution  at  different  time  instants,  in  a  sample  run 
of  the  Monte  Carlo  simulation  where  each  agent  executes 
PSG-OT,  are  shown  in  Fig.  6.  The  Hellinger  distance  (HD) 
metric  between  the  current  swarm  distribution  ( J r£)  and  the 
desired  formation  (7r)  (i.e.  Dh( tt,^)  from  (14))  is  used 
to  measure  the  convergence  of  the  swarm.  As  shown  in  the 
HD  plot  in  Fig.  5(a),  the  current  swarm  distribution  rapidly 
converges  to  the  desired  formation  within  a  couple  of  time 
step  when  the  agents  execute  PSG-OT.  On  the  other  hand,  the 
desired  formation  is  almost  achieved  after  approximately  200 
time  steps  when  the  agents  execute  the  PSG-IMC  or  PGA. 


In  this  paper,  we  present  a  new  approach  to  probabilistic 
guidance  of  swarms  of  autonomous  agents  using  distributed 
optimal  transport.  In  the  proposed  PSG-OT,  each  agent  first 
estimates  the  current  swarm  distribution  in  a  distributed 
manner  using  a  consensus  algorithm  and  then  solves  the  OTP 
to  obtain  its  guidance  trajectory  while  avoiding  collisions. 
Each  agent  thus  independently  determines  its  own  trajectory 
so  that  the  overall  swarm  quickly  converges  to  the  desired  for¬ 
mation,  and  the  algorithm  is  robust  to  external  disturbances 
or  damages.  Compared  to  prior  work  on  probabilistic  swarm 
guidance  using  homogeneous  and  inhomogeneous  Markov 
chains,  the  proposed  algorithm  guarantees  faster  convergence, 
reduces  the  number  of  transition  to  achieve  the  formation 
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Figure  7.  Time  evolution  of  the  actual  location  of  100  agents  during  multiple 
collision  avoidance  loops  is  shown.  The  agents  reach  the  desired  formation 
of  the  letter  “I”  within  one  PSG-OT  time  step.  The  transient  agents  are  in 
red  and  the  remaining  stationary  agents  are  in  blue. 


and  minimizes  a  given  cost  function,  while  satisfying  motion 
constraints.  We  also  present  a  novel  distributed  collision 
avoidance  algorithm  for  generating  collision-free  guidance 
trajectories  for  each  agent.  Results  from  multiple  simulation 
runs  demonstrate  the  properties  of  self-repair  capability,  reli¬ 
ability,  convergence,  collision  avoidance,  and  cost  effective¬ 
ness  of  PSG-OT.  Hence  PSG-OT  is  a  complete  collision-free 
solution  for  probabilistic  swarm  guidance  problem.  Future 
work  will  focus  on  finding  appropriate  cost  functions  that 
depend  on  the  dynamics  of  the  agents. 
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